AITopics | perception performance

Collaborating Authors

perception performance

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps

Neural Information Processing SystemsApr-25-2026, 00:48:07 GMT

Multi-agent collaborative perception could significantly upgrade the perception performance by enabling agents to share complementary information with each other through communication. It inevitably results in a fundamental trade-off between perception performance and communication bandwidth. To tackle this bottleneck issue, we propose a spatial confidence map, which reflects the spatial heterogeneity of perceptual information. It empowers agents to only share spatially sparse, yet perceptually critical information, contributing to where to communicate. Based on this novel spatial confidence map, we propose Where2comm, a communication-efficient collaborative perception framework. Where2comm has two distinct advantages: i) it considers pragmatic compression and uses less communication to achieve higher perception performance by focusing on perceptually critical areas; and ii) it can handle varying communication bandwidth by dynamically adjusting spatial areas involved in communication. To evaluate Where2comm, we consider 3D object detection in both real-world and simulation scenarios with two modalities (camera/LiDAR) and two agent types (cars/drones) on four datasets: OPV2V, V2X-Sim, DAIR-V2X, and our original CoPerception-UAVs. Where2comm consistently outperforms previous methods; for example, it achieves more than 100,000 lower communication volume and still outperforms DiscoNet and V2X-ViT on OPV2V.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.91)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.46)

Add feedback

1f5c5cd01b864d53cc5fa0a3472e152e-Paper-Conference.pdf

Neural Information Processing SystemsFeb-7-2026, 20:15:58 GMT

agent, information, spatial confidence map, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.14)
Asia > China > Shanghai > Shanghai (0.05)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.94)

Add feedback

Where2comm: Communication-Efficient Collaborative Perception via Spatial Confidence Maps

Neural Information Processing SystemsDec-23-2025, 21:32:15 GMT

Multi-agent collaborative perception could significantly upgrade the perception performance by enabling agents to share complementary information with each other through communication. It inevitably results in a fundamental trade-off between perception performance and communication bandwidth. To tackle this bottleneck issue, we propose a spatial confidence map, which reflects the spatial heterogeneity of perceptual information. It empowers agents to only share spatially sparse, yet perceptually critical information, contributing to where to communicate. Based on this novel spatial confidence map, we propose Where2comm, a communication-efficient collaborative perception framework. Where2comm has two distinct advantages: i) it considers pragmatic compression and uses less communication to achieve higher perception performance by focusing on perceptually critical areas; and ii) it can handle varying communication bandwidth by dynamically adjusting spatial areas involved in communication. To evaluate Where2comm, we consider 3D object detection in both real-world and simulation scenarios with two modalities (camera/LiDAR) and two agent types (cars/drones) on four datasets: OPV2V, V2X-Sim, DAIR-V2X, and our original CoPerception-UAVs. Where2comm consistently outperforms previous methods; for example, it achieves more than $100,000 \times$ lower communication volume and still outperforms DiscoNet and V2X-ViT on OPV2V.

communication-efficient collaborative perception, name change, where2comm, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.77)

Add feedback

When Autonomous Vehicle Meets V2X Cooperative Perception: How Far Are We?

Guo, An, Zhang, Shuoxiao, Tang, Enyi, Gao, Xinyu, Pang, Haomin, Tian, Haoxiang, Mu, Yanzhou, Wen, Wu, Fang, Chunrong, Chen, Zhenyu

arXiv.org Artificial IntelligenceSep-30-2025

With the tremendous advancement of deep learning and communication technology, Vehicle-to-Everything (V2X) cooperative perception has the potential to address limitations in sensing distant objects and occlusion for a single-agent perception system. V2X cooperative perception systems are software systems characterized by diverse sensor types and cooperative agents, varying fusion schemes, and operation under different communication conditions. Therefore, their complex composition gives rise to numerous operational challenges. Furthermore, when cooperative perception systems produce erroneous predictions, the types of errors and their underlying causes remain insufficiently explored. To bridge this gap, we take an initial step by conducting an empirical study of V2X cooperative perception. To systematically evaluate the impact of cooperative perception on the ego vehicle's perception performance, we identify and analyze six prevalent error patterns in cooperative perception systems. We further conduct a systematic evaluation of the critical components of these systems through our large-scale study and identify the following key findings: (1) The LiDAR-based cooperation configuration exhibits the highest perception performance; (2) Vehicle-to-infrastructure (V2I) and vehicle-to-vehicle (V2V) communication exhibit distinct cooperative perception performance under different fusion schemes; (3) Increased cooperative perception errors may result in a higher frequency of driving violations; (4) Cooperative perception systems are not robust against communication interference when running online. Our results reveal potential risks and vulnerabilities in critical components of cooperative perception systems. We hope that our findings can better promote the design and repair of cooperative perception systems.

artificial intelligence, information fusion, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2509.24927

Country:

Europe (0.93)
Asia > China (0.68)
North America > United States > California > Los Angeles County (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Beyond BEV: Optimizing Point-Level Tokens for Collaborative Perception

Li, Yang, Yuan, Quan, Luo, Guiyang, Fu, Xiaoyuan, Pan, Rui, Yang, Yujia, Shao, Congzhang, Liu, Yuewen, Li, Jinglin

arXiv.org Artificial IntelligenceAug-28-2025

Collaborative perception allows agents to enhance their perceptual capabilities by exchanging intermediate features. Existing methods typically organize these intermediate features as 2D bird's-eye-view (BEV) representations, which discard critical fine-grained 3D structural cues essential for accurate object recognition and localization. To this end, we first introduce point-level tokens as intermediate representations for collaborative perception. However, point-cloud data are inherently unordered, massive, and position-sensitive, making it challenging to produce compact and aligned point-level token sequences that preserve detailed structural information. Therefore, we present CoPLOT, a novel Collaborative perception framework that utilizes Point-Level Optimized Tokens. It incorporates a point-native processing pipeline, including token reordering, sequence modeling, and multi-agent spatial alignment. A semantic-aware token reordering module generates adaptive 1D reorderings by leveraging scene-level and token-level semantic information. A frequency-enhanced state space model captures long-range sequence dependencies across both spatial and spectral domains, improving the differentiation between foreground tokens and background clutter. Lastly, a neighbor-to-ego alignment module applies a closed-loop process, combining global agent-level correction with local token-level refinement to mitigate localization noise. Extensive experiments on both simulated and real-world datasets show that CoPLOT outperforms state-of-the-art models, with even lower communication and computation overhead. Code will be available at https://github.com/CheeryLeeyy/CoPLOT.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2508.19638

Country: North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Generalist Forecasting with Frozen Video Models via Latent Diffusion

Walker, Jacob C, Vélez, Pedro, Cabrera, Luisa Polania, Zhou, Guangyao, Kabra, Rishabh, Doersch, Carl, Ovsjanikov, Maks, Carreira, João, Ginosar, Shiry

arXiv.org Artificial IntelligenceJul-21-2025

Forecasting what will happen next is a critical skill for general-purpose systems that plan or act in the world at different levels of abstraction. In this paper, we identify a strong correlation between a vision model's perceptual ability and its generalist forecasting performance over short time horizons. This trend holds across a diverse set of pretrained models-including those trained generatively-and across multiple levels of abstraction, from raw pixels to depth, point tracks, and object motion. The result is made possible by a novel generalist forecasting framework that operates on any frozen vision backbone: we train latent diffusion models to forecast future features in the frozen representation space, which are then decoded via lightweight, task-specific readouts. To enable consistent evaluation across tasks, we introduce distributional metrics that compare distributional properties directly in the space of downstream tasks and apply this framework to nine models and four tasks. Our results highlight the value of bridging representation learning and generative modeling for temporally grounded video understanding.

artificial intelligence, forecasting, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2507.13942

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Fast2comm:Collaborative perception combined with prior knowledge

Zhang, Zhengbin, Wu, Yan, Zhang, Hongkun

arXiv.org Artificial IntelligenceMay-5-2025

Collaborative perception has the potential to significantly enhance perceptual accuracy through the sharing of complementary information among agents. However, real-world collaborative perception faces persistent challenges, particularly in balancing perception performance and bandwidth limitations, as well as coping with localization errors. To address these challenges, we propose Fast2comm, a prior knowledge-based collaborative perception framework. Specifically, (1)we propose a prior-supervised confidence feature generation method, that effectively distinguishes foreground from background by producing highly discriminative confidence features; (2)we propose GT Bounding Box-based spatial prior feature selection strategy to ensure that only the most informative prior-knowledge features are selected and shared, thereby minimizing background noise and optimizing bandwidth efficiency while enhancing adaptability to localization inaccuracies; (3)we decouple the feature fusion strategies between model training and testing phases, enabling dynamic bandwidth adaptation. To comprehensively validate our framework, we conduct extensive experiments on both real-world and simulated datasets. The results demonstrate the superior performance of our model and highlight the necessity of the proposed methods. Our code is available at https://github.com/Zhangzhengbin-TJ/Fast2comm.

artificial intelligence, machine learning, perception, (18 more...)

arXiv.org Artificial Intelligence

2505.0074

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.96)

Add feedback

Towards Efficient Roadside LiDAR Deployment: A Fast Surrogate Metric Based on Entropy-Guided Visibility

Jiang, Yuze, Javanmardi, Ehsan, Tsukada, Manabu, Esaki, Hiroshi

arXiv.org Artificial IntelligenceApr-10-2025

The deployment of roadside LiDAR sensors plays a crucial role in the development of Cooperative Intelligent Transport Systems (C-ITS). However, the high cost of LiDAR sensors necessitates efficient placement strategies to maximize detection performance. Traditional roadside LiDAR deployment methods rely on expert insight, making them time-consuming. Automating this process, however, demands extensive computation, as it requires not only visibility evaluation but also assessing detection performance across different LiDAR placements. To address this challenge, we propose a fast surrogate metric, the Entropy-Guided Visibility Score (EGVS), based on information gain to evaluate object detection performance in roadside LiDAR configurations. EGVS leverages Traffic Probabilistic Occupancy Grids (TPOG) to prioritize critical areas and employs entropy-based calculations to quantify the information captured by LiDAR beams. This eliminates the need for direct detection performance evaluation, which typically requires extensive labeling and computational resources. By integrating EGVS into the optimization process, we significantly accelerate the search for optimal LiDAR configurations. Experimental results using the AWSIM simulator demonstrate that EGVS strongly correlates with Average Precision (AP) scores and effectively predicts object detection performance. This approach offers a computationally efficient solution for roadside LiDAR deployment, facilitating scalable smart infrastructure development.

artificial intelligence, configuration, placement, (15 more...)

arXiv.org Artificial Intelligence

2504.06772

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.29)

Genre: Research Report (0.82)

Industry: Transportation > Ground > Road (0.47)

Technology:

Information Technology > Sensing and Signal Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.94)

Add feedback

Fresh2comm: Information Freshness Optimized Collaborative Perception

Wu, Ziyong, Peng, Zhilin, Yu, Lei

arXiv.org Artificial IntelligenceFeb-11-2025

Collaborative perception is a cornerstone of intelligent connected vehicles, enabling them to share and integrate sensory data to enhance situational awareness. However, measuring the impact of high transmission delay and inconsistent delay on collaborative perception in real communication scenarios, as well as improving the effectiveness of collaborative perception under such conditions, remain significant challenges in the field. To address these challenges, we incorporate the key factor of information freshness into the collaborative perception mechanism and develop a model that systematically measures and analyzes the impacts of real-world communication on collaborative perception performance. This provides a new perspective for accurately evaluating and optimizing collaborative perception performance. We propose and validate an Age of Information (AoI)-based optimization framework that strategically allocates communication resources to effectively control the system's AoI, thereby significantly enhancing the freshness of information transmission and the accuracy of perception. Additionally, we introduce a novel experimental approach that comprehensively assesses the varying impacts of different types of delay on perception results, offering valuable insights for perception performance optimization under real-world communication scenarios.

artificial intelligence, machine learning, perception, (17 more...)

arXiv.org Artificial Intelligence

2502.07852

Country: